fix(run-engine): fix queue cache memory leak and replace MemoryStore with LRU cache#2945
fix(run-engine): fix queue cache memory leak and replace MemoryStore with LRU cache#2945
Conversation
|
WalkthroughIntroduces a new LRU-backed in-memory cache implementation (LRUMemoryStore, createLRUMemoryStore) and exports it from internal-packages/cache, adds the lru-cache dependency and Vitest config, and adds comprehensive tests for the LRUMemoryStore. Replaces many in-process MemoryStore usages across the codebase with createLRUMemoryStore(...) calls (webapp services, run-engine, queues, rate-limiting, idempotency, etc.). Also updates one private method signature in runAttemptSystem to change how queue cache keys are derived and used. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review CompleteYour review story is ready! Comment !reviewfast on this PR to re-generate the story. |
8bd2b77 to
2871f4a
Compare
…with LRU cache (triggerdotdev#2945) - Fix memory leak in RunAttemptSystem queue cache - was keying by runId instead of queue identifier - Replace `@unkey/cache` MemoryStore with new LRUMemoryStore for O(1) operations and better memory bounds ## Problem ### Cache Key Bug The queue cache in `#resolveTaskRunExecutionQueue` was keyed by `runId`, creating one cache entry per run instead of per queue. With 1-2 hour TTLs and 5000 entry soft cap, these accumulated causing memory growth. ### MemoryStore Performance The `@unkey/cache` MemoryStore uses O(n) synchronous iteration for eviction, blocking the event loop at high throughput. ## Solution ### Cache Key Fix Changed cache key from `params.runId` to queue identifier: ```typescript const cacheKey = params.lockedQueueId ?? `${params.runtimeEnvironmentId}:${params.queueName}`; ``` LRU Cache Created LRUMemoryStore adapter using lru-cache package: - O(1) get/set/delete operations - Strict memory bounds (hard max vs soft cap) - No event loop blocking Test Results: | Metric | Before Fix | After Fix | |---|---|---| | Queue cache entries (per 1000 runs) | ~1000 | 1 | | Old space growth | 32.27 MB | 5.45 MB | | Heap growth | 7.21 MB (3.9%) | 4.81 MB (2.6%) | <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/triggerdotdev/trigger.dev/pull/2945"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
@unkey/cacheMemoryStore with new LRUMemoryStore for O(1) operations and better memory boundsProblem
Cache Key Bug
The queue cache in
#resolveTaskRunExecutionQueuewas keyed byrunId, creating one cache entry per run instead of per queue. With 1-2 hour TTLs and 5000 entry soft cap, these accumulated causing memory growth.MemoryStore Performance
The
@unkey/cacheMemoryStore uses O(n) synchronous iteration for eviction, blocking the event loop at high throughput.Solution
Cache Key Fix
Changed cache key from
params.runIdto queue identifier:LRU Cache
Created LRUMemoryStore adapter using lru-cache package:
Test Results: